SESA: Supervised Explicit Semantic Analysis

نویسندگان

  • Dasha Bogdanova
  • Majid Yazdani
چکیده

In recent years supervised representation learning has provided state of the art or close to the state of the art results in semantic analysis tasks including ranking and information retrieval. The core idea is to learn how to embed items into a latent space such that they optimize a supervised objective in that latent space. The dimensions of the latent space have no clear semantics, and this reduces the interpretability of the system. For example, in personalization models, it is hard to explain why a particular item is ranked high for a given user profile. We propose a novel model of representation learning called Supervised Explicit Semantic Analysis (SESA) that is trained in a supervised fashion to embed items to a set of dimensions with explicit semantics. The model learns to compare two objects by representing them in this explicit space, where each dimension corresponds to a concept from a knowledge base. This work extends Explicit Semantic Analysis (ESA) [8] with a supervised model for ranking problems. We apply this model to the task of Job-Profile relevance in LinkedIn in which a set of skills defines our explicit dimensions of the space. Every profile and job are encoded to this set of skills their similarity is calculated in this space. We use RNNs to embed text input into this space. In addition to interpretability, our model makes use of the web-scale collaborative skills data that is provided by users for each LinkedIn profile. Our model provides state of the art result while it remains interpretable.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Discovering Semantically Similar Associations (SeSA) for Complex Mappings between Conceptual Models

There is an increasing demand for discovering meaningful relationships, i.e., mappings, between conceptual models for interoperability. Current solutions have been focusing on the discovery of correspondences between elements in different conceptual models. However, a complex mapping associating a structure connecting a set of elements in one conceptual model with a structure connecting a set o...

متن کامل

Lump at SemEval-2017 Task 1: Towards an Interlingua Semantic Similarity

This is the Lump team participation at SemEval 2017 Task 1 on Semantic Textual Similarity. Our supervised model relies on features which are multilingual or interlingual in nature. We include lexical similarities, cross-language explicit semantic analysis, internal representations of multilingual neural networks and interlingual word embeddings. Our representations allow to use large datasets i...

متن کامل

Cross-Lingual Dataless Classification for Many Languages

Dataless text classification [Chang et al., 2008] is a classification paradigm which maps documents into a given label space without requiring any annotated training data. This paper explores a crosslingual variant of this paradigm, where documents in multiple languages are classified into an English label space. We use CLESA (cross-lingual explicit semantic analysis) to embed both foreign lang...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1708.03246  شماره 

صفحات  -

تاریخ انتشار 2017